Sørensen similarity index

The Sørensen index, also known as Sørensen’s similarity coefficient, is a statistic used for comparing the similarity of two samples. It was developed by the botanist Thorvald Sørensen and published in 1948.[1]

It is often misspelled as Sorenson index, Soerenson index and Sörenson index (also with the correct ending -sen).

Sørensen's original formula was intended to be applied to presence/absence data, and is

 QS = \frac{2C}{A %2B B}

where A and B are the number of species in samples A and B, respectively, and C is the number of species shared by the two samples; QS is the quotient of similarity and ranges from 0 - 1. This expression is easily extended to abundance instead of presence/absence of species. This quantitative version of the Sørensen index is also known as Czekanowski index. The Sørensen index is identical to Dice's coefficient[2] which is always in [0, 1] range. The Sørensen index used as a distance measure, 1 − QS, is identical to Hellinger distance and Bray Curtis dissimilarity[3] when applied to quantitative data.

The Sørensen coefficient is mainly useful for ecological community data (e.g. Looman & Campbell, 1960[4]). Justification for its use is primarily empirical rather than theoretical (although it can be justified theoretically as the intersection of two fuzzy sets[5]). As compared to Euclidean distance, Sørensen distance retains sensitivity in more heterogeneous data sets and gives less weight to outliers [6].

See also

References

  1. ^ Sørensen, T. (1957) A method of establishing groups of equal amplitude in plant sociology based on similarity of species and its application to analyses of the vegetation on Danish commons. Biologiske Skrifter / Kongelige Danske Videnskabernes Selskab, 5 (4): 1–34.
  2. ^ www.sekj.org/PDF/anbf40/anbf40-415.pdf
  3. ^ J. Roger Bray and J. T. Curtis (1948) An Ordination of the Upland Forest Communities of Southern Wisconsin. Ecological Monographs 27(4):326–349.
  4. ^ Looman, J. and Campbell, J.B. (1960) Adaptation of Sorensen's K (1948) for estimating unit affinities in prairie vegetation. Ecology 41 (3): 409–416.
  5. ^ Roberts, D.W. (1986) Ordination on the basis of fuzzy set theory. Vegetatio 66 (3): 123–131.
  6. ^ McCune, Bruce & Grace, James (2002) Analysis of Ecological Communities. Mjm Software Design; ISBN 0972129006.